WHAT IS CLAIMED IS: 

1. 'An arithmetic circuit including at least one borrow parallel counter and at least one 4-bit 
one-hot digital signal, said circuit achieving high performance while expending low-power, said 
circuit comprising: 

a full-adder, which adds three bits represented by two 4-b 1-hot signals and a binary signal 
respectively without intermediate conversion. 

2. The arithmetic circuit of claim 1, wherein said borrow parallel counter is constructed of 
Complementary Metal Oxide Semiconductor (CMOS) and uses greater weighted input bits. 

3. The arithmetic circuit of claim 1, wherein a very large semiconductor (VLSI) design is 
improved by increasing speed of a calculation performed by said arithmetic circuit, decreasing area- 
transistor count; improving nMOS/pMOS ratio, and increasing power dissipation. 

4. The arithmetic circuit of claim 1, wherein said circuit includes lower switching activity and 
use of fewer hot lines as compared with a binary circuit for use in low-power high-performance 
arithmetic applications. 

5. A multiplier circuit including borrow parallel multiplier circuits and virtual multiplier 
circuits using borrow parallel counters providing low-power, high-speed, and small-area features, said 
multiplier comprising: 

regular and unified layouts for small multipliers of n x n, where 3<n<9 including a single array 
of almost identical borrow counters; 

reduced line connections including partial product bits generations and their connections to the 
bit reduction networks; and 

a substantially same delay for almost all output bits, wherein transistor sizing and delay 
equalization is minimized. 

6. The multiplier circuit of claim 5, wherein a "borrow-effect" re-arranges input bits to be 
processed so that the actual bits to each column are balanced and equal. 

7. The multiplier circuit of claim 5, wherein a total length of line connections in said multiplier 
is minimized due to only a single counter being used in each column. 
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8. A multiplier triple-expansion non-Booth circuit comprising a partial product bit matrix 

decomposition circuit for efficient generation of large multipliers from smaller multipliers, wherein 
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each expansion triples the size of the large multipliers. 

9. The circuit of claim 8, further minimizing inter-connections and being self- testable at high- 
speed and low-power, and having high VLSI performance without an extra built-in test circuit and 
complex wiring. 

10. The circuit of claim 8, wherein said multipliers have only about 9% to 20% more 
transistors than minimum existing Booth multipliers. 

1 1 . The circuit of claim 8, wherein said circuit is used in pipelined and multiply-accumulate 
(MAC) processors for performing natural four stage operations selected from one of base virtual 
multiplication, level- 1, level-2 bit reductions and the fast final addition. 

12. The circuit of claim 11, wherein said circuit is further performs natural four stage 
operations with equalized delays. 

13. A multiplier circuit utilizing 4-b 1-hot encoded signals and borrow bits, the circuit 
comprising: 

at least two input numbers, each of said input numbers being trisected into three segments; 
a plurality of Carry Select Adders (CSAs); 

a plurality of multipliers interconnected to the CSAs, said multipliers being arranged to 
minimize the interconnection to the CSAs; and 
a plurality of output bits. 

14. A multiplier circuit of claim 13, further comprising a plurality of levels of 3:2 and 4:2 
counters and a latch for each of said output bits. 

15. The multiplier circuit of claim 13, wherein a 54 x 54-b pipelined multiplier is implemented 
in an area of 434.8 x 769.5 = 334,578.6 m 2 with a 0.18m technology, achieving a 1GHz at 1.8V supply 
and a low-power performance. 
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16. The multiplier circuit of claim 13, wherein at least 9 multipliers are used, said multipliers 
being'selected from one of 

6 x 6-b (4, 2)-(3, 2) based virtual multiplier totaling 18 x 18-b, and 
6 x 6-b borrow parallel virtual multiplier totaling 18x1 8-b. 

17. The multiplier circuit of claim 13, wherein fewer transistors for signal type conversion 
from non-binary to binary are required. 

1 8. The multiplier circuit of claim 13, wherein said CSAs are 4-b 1 -hot borrow parallel 
counters including a 5_1 counter, wherein said 5_1 counter uses 78 transistors, about two third being 
nMOS transistor cells, and 56 transistors being used to pass 4-b 1-hot signals, thereby reducing power- 
consuming activities. 

19. The multiplier circuit of claim 18, wherein said CSAs implement equations 
Al +A2 + A3 + A4 + 2A5 = sO + 2sl + 4Q) 

Xo=sO; 

Yo=Xi XORsl; 
Zo=Xi; 

S=Yi XORQ; and 

C= Zi AND Yi' OR Q AND Yi, where A1-A5 are input bits with A5 being a borrow bit; sO, si 
and Q are temporary parameters; and Xo, Yo, Zo and Xi, Yi, Zi are in-stage carry (out/in) bits. 

20. A small borrow parallel multiplier circuit for processing a plurality of bit inputs, the 
multiplier comprising: 

an array including a plurality of identical counters with a simple layout arranged in a plurality of 
columns, wherein "borrow-effect" naturally re-arranges bits being processed so that an actual number 
of bits processed in each column are balanced; 

minimal line connections within each line, wherein a single counter is used in each column; and 
a plurality of output bits having similar delay, wherein said multiplier requiring little cost in 
transistor sizing and delay equalization. 

21. The multiplier circuit of claim 20, wherein said delay is selected from one of about 0.6ns 
and 2 times a (4, 2) delay. 
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22. The multiplier circuit of claim 20, wherein said multiplier has the same height as a single 
5_1 counter, providing extra regularity and compact layout. 
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23. The multiplier circuit of claim 20, wherein a 6 x 6 multiplier is implemented in 1 80 ^m 
CMOS technology has an area of 12.87 x 16.0 ^m 2 when using a 5_1 counter and an area of 26.5 x 
85.5 jam 2 when using a 5_1 1 counter 

24. The multiplier circuit of claim 20, wherein a CSA block of an 18 x 18 multiplier has an 
area of about 34.2 x 85.5 x 3 [im 2 . 

25. The multiplier circuit of claim 20, wherein a CSA block of a 54 x 54 multiplier has an area 
of about 48.7 x 85.5 x 9 urn 2 . 

26. The multiplier circuit of claim 20, wherein a 54 x 54 multiplier including a CSA block has 
a layout in a rectangular area with a height of ((26.5 + 5) x 3+ 34.2) x 3 + 48.7 = 434.8|im and a width 
of 85.5x 9 = 769.5^m, equaling an area of 434.8 x 769.5 = 334,578.6 ^m 2 . 

27. The multiplier circuit of claim 20, wherein components of said multiplier are modular and 
repeated, a low-power and pipeline frequency of 1GHz is achieved, and said multiplier is self-testable, 
as provided by a triple expansion logic scheme. 

28. A method of optimizing only one column of a plurality of CSA block columns in a triple 
expansion scheme of a multiplier for processing a plurality of bit inputs, the method comprising the 
steps of: 

providing a first level of application of a triple expansion scheme P x P, where P is (3m+zl), m 
is an integer multiplier, and zl is {0, 1, -1}; and 

expanding the first level of application according to an E x E, where E is (3P+z2) and z2 is {0, 

1,-1}. 

29. The method of claim 28, wherein m=4, zl—1, and z2=- 1. 

30. The method of claim 28, wherein m=6, zl= 0, and z2= 0. 

31. The method of claim 28, wherein m=7, zl= 0, and z2=l. 
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32. The method of claim 28, wherein m=5, zl= 0, and z2=-l. 
33'. ThCmethod of claim 28, wherein m=8, zl= 0, and z2=0. 
34. The method of claim 28, wherein m=9, zl= 0, and z2=0. 
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